NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Optimal ensemble construction for multistudy prediction with applications to mortality estimation

https://doi.org/10.1002/sim.10006

Loewinger, Gabriel; Nunez, Rolando Acosta; Mazumder, Rahul; Parmigiani, Giovanni (April 2024, Statistics in Medicine)

It is increasingly common to encounter prediction tasks in the biomedical sciences for which multiple datasets are available for model training. Common approaches such as pooling datasets before model fitting can produce poor out‐of‐study prediction performance when datasets are heterogeneous. Theoretical and applied work has shownmultistudy ensemblingto be a viable alternative that leverages the variability across datasets in a manner that promotes model generalizability. Multistudy ensembling uses a two‐stagestackingstrategy which fits study‐specific models and estimates ensemble weights separately. This approach ignores, however, the ensemble properties at the model‐fitting stage, potentially resulting in performance losses. Motivated by challenges in the estimation of COVID‐attributable mortality, we proposeoptimal ensemble construction, an approach to multistudy stacking whereby we jointly estimate ensemble weights and parameters associated with study‐specific models. We prove that limiting cases of our approach yield existing methods such as multistudy stacking and pooling datasets before model fitting. We propose an efficient block coordinate descent algorithm to optimize the loss function. We use our method to perform multicountry COVID‐19 baseline mortality prediction. We show that when little data is available for a country before the onset of the pandemic, leveraging data from other countries can substantially improve prediction accuracy. We further compare and characterize the method's performance in data‐driven simulations and other numerical experiments. Our method remains competitive with or outperforms multistudy stacking and other earlier methods in the COVID‐19 data application and in a range of simulation settings.
more » « less
Full Text Available
Grouped variable selection with discrete optimization: Computational and statistical perspectives

Hazimeh, Hussein; Mazumder, Rahul; Radchenko, Peter (February 2023, The Annals of Statistics)

Full Text Available
Sparse regression at scale: branch-and-bound rooted in first-order optimization

Hazimeh, Hussein; Mazumder, Rahul; Saab, Ali (November 2022, Mathematical Programming)

Full Text Available
Integration of survival data from multiple studies

https://doi.org/10.1111/biom.13517

Ventz, Steffen; Mazumder, Rahul; Trippa, Lorenzo (June 2021, Biometrics)
null (Ed.)
Full Text Available
Subset Selection with Shrinkage: Sparse Linear Modeling when the SNR is low

Mazumder, Rahul; Radchenko, Peter; Dedieu, Antoine. (June 2021, ArXivorg)

Full Text Available
Discussion of “Best Subset, Forward Stepwise or Lasso? Analysis and Recommendations Based on Extensive Comparisons”

https://doi.org/10.1214/20-STS807

Mazumder, Rahul (November 2020, Statistical Science)
null (Ed.)
Full Text Available
Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms

https://doi.org/10.1287/opre.2019.1919

Hazimeh, Hussein; Mazumder, Rahul (September 2020, Operations Research)
null (Ed.)
The L 0 -regularized least squares problem (a.k.a. best subsets) is central to sparse statistical learning and has attracted significant attention across the wider statistics, machine learning, and optimization communities. Recent work has shown that modern mixed integer optimization (MIO) solvers can be used to address small to moderate instances of this problem. In spite of the usefulness of L 0 -based estimators and generic MIO solvers, there is a steep computational price to pay when compared with popular sparse learning algorithms (e.g., based on L 1 regularization). In this paper, we aim to push the frontiers of computation for a family of L 0 -regularized problems with additional convex penalties. We propose a new hierarchy of necessary optimality conditions for these problems. We develop fast algorithms, based on coordinate descent and local combinatorial optimization, that are guaranteed to converge to solutions satisfying these optimality conditions. From a statistical viewpoint, an interesting story emerges. When the signal strength is high, our combinatorial optimization algorithms have an edge in challenging statistical settings. When the signal is lower, pure L 0 benefits from additional convex regularization. We empirically demonstrate that our family of L 0 -based estimators can outperform the state-of-the-art sparse learning algorithms in terms of a combination of prediction, estimation, and variable selection metrics under various regimes (e.g., different signal strengths, feature correlations, number of samples and features). Our new open-source sparse learning toolkit L0Learn (available on CRAN and GitHub) reaches up to a threefold speedup (with p up to 10 6 ) when compared with competing toolkits such as glmnet and ncvreg.
more » « less
Full Text Available
Mining events with declassified diplomatic documents

https://doi.org/10.1214/20-AOAS1344

Gao, Yuanjun; Goetz, Jack; Connelly, Matthew; Mazumder, Rahul (December 2020, The Annals of Applied Statistics)
null (Ed.)
Full Text Available
Computing the degrees of freedom of rank-regularized estimators and cousins

https://doi.org/10.1214/20-EJS1681

Mazumder, Rahul; Weng, Haolei (January 2020, Electronic Journal of Statistics)
null (Ed.)
Full Text Available
Randomized Gradient Boosting Machine

https://doi.org/10.1137/18M1223277

Lu, Haihao; Mazumder, Rahul (January 2020, SIAM Journal on Optimization)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records